frontier model forum
STREAM (ChemBio): A Standard for Transparently Reporting Evaluations in AI Model Reports
McCaslin, Tegan, Alaga, Jide, Nedungadi, Samira, Donoughe, Seth, Reed, Tom, Bommasani, Rishi, Painter, Chris, Righetti, Luca
Evaluations of dangerous AI capabilities are important for managing catastrophic risks. Public transparency into these evaluations - including what they test, how they are conducted, and how their results inform decisions - is crucial for building trust in AI development. We propose STREAM (A Standard for Transparently Reporting Evaluations in AI Model Reports), a standard to improve how model reports disclose evaluation results, initially focusing on chemical and biological (ChemBio) benchmarks. Developed in consultation with 23 experts across government, civil society, academia, and frontier AI companies, this standard is designed to (1) be a practical resource to help AI developers present evaluation results more clearly, and (2) help third parties identify whether model reports provide sufficient detail to assess the rigor of the ChemBio evaluations. We concretely demonstrate our proposed best practices with "gold standard" examples, and also provide a three-page reporting template to enable AI developers to implement our recommendations more easily.
AI Leaders Create Industry Watchdog as Government Scrutiny Grows
Facing calls to put guardrails on artificial intelligence development, a group of tech companies including Alphabet Inc.'s Google and OpenAI Inc. are creating an industry body to ensure that AI models are safe. The effort, also backed by AI startup Anthropic and Microsoft Corp., aims to consolidate the expertise of member companies and create benchmarks for the industry, according to a statement Wednesday. The group, known as the Frontier Model Forum, said it welcomed participation from other organizations working on large-scale machine-learning platforms. The fast proliferation of generative AI tools such as OpenAI's ChatGPT, which can create text, photos and even video based on simple prompts, has put pressure on tech giants to tread carefully. The companies involved in the Frontier Model Forum have already agreed to put safeguards in place -- at the urging of the White House -- before Congress potentially passes binding regulations.